feat: add collections validation and GFQL support#874
Conversation
|
general comment, validate_mode = 'autofix' seems more likely to produce invalid output and/or result in a collection getting skipped than otherwise. to me its biggest value is typecasting stuff like id=1 to id="1" etc. not sure how much value there is in making it explicit, maybe we just have warn true/false and otherwise handle things silently? |
|
hmm, if they do something like that was mostly added b/c people keep having dirty data that fails arrow conversion and rather seeing it load vs fixing their data/cfg |
|
@lmeyerov regarding general comment above: im thinking maybe instead of
if we feel we need to type cast we might want to just do it anyway strict true or not |
|
the issue with strict true/false is that's closer to what we had before, and users were complaining that they just wanted it to 'work', hence autofix (coerce++). validate=true/false is closer to what you're thinking, while 'autofix' (coerce) is "it'll run, but may not be what you want, but you said wanted soemthing that runs" We therefore have leeway in what autofix does --- we just need to warn (if warn=auto/true), and do what. So the q is... what should it do wrt diff collections errors, if strong opinions about any params in any direction? My default intuition is probably:
|
|
agree with that, i think we do the 3 bullets listed if strict (or whatever we want to call it) is false and its almost what we are doing right now. theres only 1 point i would make about the current implementation
in summary its only the approach i think might need to change, not the intention EDIT: if we do the above it actually gets closer to a true autofix than what it was before, so i dont have an issue with the name anymore |
3cf9e9d to
41a33c4
Compare
aucahuasi
left a comment
There was a problem hiding this comment.
Thanks for this PR, I left some comments and I think we need to solve this feedback before merge:
https://github.com/graphistry/pygraphistry/pull/874/changes#r2778233409
- Remove dead code after return in _normalize_intersection_expr - Add empty sets list validation in _normalize_sets_list - Remove pointless type coercion (str() won't produce valid types) - Consolidate GFQL parsing: _normalize_gfql_ops calls _wrap_gfql_expr - Add cross-validation for intersection set ID references - Require id field for sets (server requires it, no auto-generation) - Use precise exception handling (TypeError, ValueError, GFQLValidationError) - Update tests to include required id fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Review Comments Addressed (commits 23d1f00, d2bdf02, 47d87e8)HIGH Priority - Fixed
MEDIUM Priority - Fixed
LOW Priority - Fixed
Still Open / Deferred
Key Design Decisions
CI Status: All jobs green (manual workflow_dispatch run 21793451172) - python-lint-types 3.8-3.14 ✅, all test suites ✅ |
9cc4506 to
47d87e8
Compare
aucahuasi
left a comment
There was a problem hiding this comment.
Thanks for fixing the previous issue, here are some others that caught my attention:
https://github.com/graphistry/pygraphistry/pull/874/changes#r2789010065
https://github.com/graphistry/pygraphistry/pull/874/changes#r2789125932
|
Thanks for the thorough reviews! All comments have been addressed: Consolidation & Architecture:
Validation Improvements:
Code Quality:
CI is green, ready for merge. |
- Remove dead code after return in _normalize_intersection_expr - Add empty sets list validation in _normalize_sets_list - Remove pointless type coercion (str() won't produce valid types) - Consolidate GFQL parsing: _normalize_gfql_ops calls _wrap_gfql_expr - Add cross-validation for intersection set ID references - Require id field for sets (server requires it, no auto-generation) - Use precise exception handling (TypeError, ValueError, GFQLValidationError) - Update tests to include required id fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set collections require id field (no auto-generation) - Intersection cross-validation for set ID references - GFQL parsing consolidation with precise exception handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address review comment #2683393920 - removes unnecessary helper function. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CollectionsInput includes TypedDicts which need to be cast to Dict[str, Any]. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server requires IDs for all collection types (used as storage keys).
In autofix mode, generate `{type}_{idx}` IDs when missing instead of
dropping the collection. This makes simple use cases "just work" while
still warning about the missing ID.
- Sets get `set_0`, `set_1`, etc.
- Intersections get `intersection_0`, `intersection_1`, etc.
- Strict mode still rejects missing IDs
- Added test for auto-generation behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change from `set_0` to `set-0` for consistency with other ID patterns
in the codebase (e.g., kepler's `dataset-{uuid[:8]}`).
User-provided IDs can be any string - no validation beyond type check.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add normalize_gfql_to_wire() as canonical GFQL→wire implementation - Simplify collections.py (removed 36 lines of duplicate logic) - Simplify validate_collections.py to call compute/ast - Clean import graph: models → compute → helpers/validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…intersections) - Allow intersections to reference other intersections (backend supports this) - Detect self-references (intersection referencing itself) - Detect cycles in intersection dependencies (A->B->A) - Add tests for DAG validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AST class-level from_json methods (ASTLet, ASTRemoteGraph, ASTRef, ASTCall) use bare `assert` for required field checks. These raise AssertionError which was not caught by the validation boundary, causing raw crashes instead of proper validation errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2f12a79 to
3a4f494
Compare
Addressed in follow-up commits and subsequent discussion; all associated review threads are resolved and current head passes CI.
Summary
graphistry/collections.pyand introduce typed models ingraphistry/models/collections.pyKey Features
g.collections(...)API for defining subsets via GFQL expressions with priority-based visual encodingsgraphistry.collection_set(...)andgraphistry.collection_intersection(...)showCollections,collectionsGlobalNodeColor, andcollectionsGlobalEdgeColorURL paramsValidation Behavior
ValueErroron invalid inputidfield (server requirement) - missing IDs are warned/dropped, not auto-generated_wrap_gfql_expras canonical implementation with precise exception handlingTesting
Review Comments Addressed
See comment for detailed breakdown of all review comments and their resolutions.